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Abstract 

We  study  the  asymptotic  performance  of  linear  predictors  of  continuous¬ 
time  stationary  processes  from  observations  at  n  sampling  Instants  on  a  fixed 
observation  Interval.  We  consider  both  optimal  and  simpler  choices  of  predict¬ 
or  coefficients;  uniform  sampling,  as  well  as  nonuniform  sampling  tailored  to 
the  statistics  of  the  process  under  prediction.  We  concentrate  on  stationary 
processes  with  rational  spectral  densities  and  obtain  the  asymptotic  perform¬ 
ance  for  cases  with  no  and  with  one  quadratic-mean  derivative.  The  analytical 
results  are  supplemented  by  numerical  examples  depicting  small  and  large  sample 
size  performance. 
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I.  INTRODUCTION 


A  continuous-time  stationary  process  is  to  be  predicted  linearly  from  ob¬ 
servations  at  a  finite  number  of  sampling  instants  from  a  fixed  observation 
Interval.  What  is  the  best  location  of  the  sampling  points,  for  fixed  sample 
size  or  asymptotically  as  the  sample  size  tends  to  infinity?  At  how  many 
points  should  the  process  be  sampled  for  the  predictor  to  achieve  a  specified 
mean  square  prediction  error?  If  the  process  is  sampled  uniformly,  what  should 
the  sampling  rate  be  for  a  desirable  predictor  performance,  measured  in  mean 
square  error?  These  questions  are  answered  by  studying  the  performance  of 
discrete-time  linear  predictors  as  the  sample  size  tends  to  infinity. 

We  consider  uniform  sampling  as  well  as  nonuniform  sampling,  e.g.  at  fixed 
quantiles  of  a  probability  density  over  the  observation  interval.  We  are 
strongly  Interested  in  the  performance  of  uniform  sampling,  but  we  would  also 
like  to  know  whether  appropriately  chosen  nonuniform  sampling  may  result  in  ap¬ 
preciable  improvement  of  performance.  We  consider  optimal  and  certain  suboptimal 
linear  predictors.  The  optimal  predictors  require  the  inversion  of  an  n^n 
matrix  for  each  sample  size  n.  The  suboptimal  predictors  we  use,  require  the 
solution  of  an  integral  equation,  but  then  the  choice  of  predictor  coefficients 
for  each  sample  size  is  very  simple. 

We  concentrate  on  stationary  processes  with  rational  spectral  density. 


Wtien  the  process  has  no  quadratic-mean  derivative,  both  the  optimal  and  sub- 

-2 

optimal  predictors  have  the  same  rate  n  and  the  same  as>Tnptotic  constant, 
which  depends  on  the  sampling  design.  This  allows  us  to  check  whether  the 
asymptotically  optimal  sampling  design  is  uniform  or  nonuniform,  and,  in  the 
latter  case,  to  compare  its  performance  with  that  of  uniform  sampling.  The 
small  sample  performance  of  uniform  versus  nonuniform  samplinc,,  and  of  optimal 


versus  suboptimal  predictors  is  i  I  1 usr lat oil  bv  specific  numerical  examples. 

HvuiiaOiiiiy  Codes 
*  Special 


□  □ 


We  next  consider  the  case  where  the  process  has  exactly  one  quadratic-mean 

derivative.  When  the  suboptlmal  predictor  Is  used,  uniform  sampling  has  rate 

-1  -4 

n  while  nonuniform  sampling  schemes  can  be  designed  with  rate  n  .  This 

striking  discrepancy  in  the  rate  of  convergence  is  due  to  the  presence  of  de¬ 
rivatives  of  delta  functions  in  the  continuous- time  linear  predictor  filter. 

The  asymptotic  performance  of  the  optimal  predictor  is  an  open  question  at 
present.  In  this  case,  the  comparison  in  small  sample  performance  and  in  asymp¬ 
totics  of  optimal  versus  suboptlmal  predictors  is  done  via  specific  numerical 
examples . 

In  the  context  of  regression  problems,  sampling  designs  were  considered  by 
Sacks  and  Ylvisaker  [3]-(5].  Sampling  designs  for  suboptlmal  integral  estimators 
and  detectors  were  considered  by  Schoenfelder  [6]  and  by  Cambanis  and  Masry  [1], 
respectively.  The  current  paper  considers  sampling  designs  in  the  context  of 
prediction  of  stationary  processes  and  provides  new  analytical  results,  extending 
the  earlier  works  [1],[6],  for  processes  with  one  quadratic-mean  derivative. 

The  organization  of  the  paper  is  as  follows.  The  formulation  of  the  problem 
is  given  in  Section  II.  Theoretical  and  numerical  results  for  processes  having 
no  quadratic-mean  derivative  are  presented  in  subsection  A  of  Section  II.  The 
corresponding  results  for  processes  with  one  quadratic-mean  derivative  are  given 
in  subsection  B  of  Section  II.  The  proof  of  the  asymptotic  performance  for  pro¬ 
cesses  with  one  quadratic-mean  derivative  is  delegated  to  the  Appendix. 


II.  PERFORMANCE  OF  DISCRETE-TIME  PREDICTORS 


We  consider  a  stationary  process  X  =  {X(t),  <  t <  °°}  with  mean  zero  and 

covariance  function  R(t) .  We  want  to  predict  linearly  the  value  of  the  process 
at  time  s>T  from  n  observations  of  the  process  taken  at  the  sampling  instants 
D  =  }"  from  the  Interval  I  =  [-T,T],  namely  from  the  observations  {X(t 

K  K*”  J.  K  K 

The  linear  predictor  X^Cs)  has  the  form 


X^(s)  .  -  SiiSo 


where  the  row  vectors  c '  and  XI  are  defined  by  c '  =  (c  , . . . ,  c  )  and 

~u  ~ij  D»1  D,n 

Xp  =  (X(tj),...,  X(t^)).  Our  goal  is  to  choose  the  n  sampling  points  D  and  the 
predictor  coefficients  so  that  the  resulting  mean-square  prediction  error 


ep(s)  =  E[X(s)  -Xp(s)]^  =  R(0)  -  2c^  R^Cs)  +  R^^^^ 


should  be  as  small  as  possible  for  fixed  sample  size  n  or  asymptotically  as  n 

tends  to  infinity.  Here  ]^(s)  is  the  row  vector  (R(s  -  t  j)  , .  .  .  ,  R(s-t^))  and 

R„  is  the  covariance  matrix  [R(t,  -t.)]” 

I)  k  ,],k=l 

For  a  specified  set  of  n  sampling  points  D,  a  natural  choice  of  predictor 
coefficients  c^^  is  the  optimal  coefficients  which  the 


linear  predictor 


is  tlu  pro  i  fc  t  ion  of  X(s)  on  the  data  space  generated  by  {X('i),  T  eD),  and  the 

2 

I  orresponding  minimum  mean-square  prediction  error  c  (s)  is  given  by 


ep(s)  =  E[X^s)]  -  E[^(s)]  =  R(0)  -  R^(s)R^^Rp(s). 


(2.3 


) 


For  a  specified  n  point  sampling  design  D,  finding  the  optimal  coefficients  in¬ 
volves  inverting  an  n^n  covariance  matrix,  and  the  weight  c^^  for  the  k*'*^  ob¬ 
servation  5C(tj^)  depends  on  all  sampling  points  Simpler,  non- 

optimal,  weights  are  naturally  suggested  by  the  form  of  the  continuous  data 
predictor  whenever  it  Is  known,  and  we  discuss  this  next. 

When  the  entire  continuous  record  {X(t),  Te  I=[-T,T]}  is  available,  we 
denote  the  optimal  linear  mean-square  error  predictor  of  X(s)  by  Xj(s)  and 
its  minimum  mean-square  error  by 

cj(s)  =  E[X(s)  -  ^^(s)]^  =  R(0)  -  E[ftj(s)X(s)] .  (2.4 

For  any  discrete-time  predictor  ^(s)  (with  or  without  optimal  coefficients 
Cjj)  we  have 


ej(s)  =  F.[X(s)  -Xp(s)]^  =  E{[X(s)  -X^(s)]  -  [i^^(s)  -  X^(s)]}^ 

=  c^(s)  +  E[^^(s)  -  Xp(s)]^  (2.5 

where  the  cross  term  vanishes  since  the  error  X(s)  -X^(s)  is  orthogonal  to 
t  lie  space  generated  by  the  continuous  data  (X(;),  X<  ll  to  which  both  X^(s) 
and  belong.  Thus  the  excess  error  of  any  discrete-lime  predictor  is 

1’,  i  vcn  h V 


e;(s)  -  l’;(s)  =  E[$  (s)  -X^(s)]" 


(2.b) 


and  In  particular  the  excess  error  of  the  optimal-coefficients  discrete-time 
predictor  (2.2)  is 


e^Cs)  -  ej(s)  =  E[Xj(s)]  -  E[^(s)] 


by  the  projection  theorem. 

Since  the  excess  mean-square  error  (2.6)  of  every  discrete-time  predictor 
is  measured  by  how  well  the  discrete-time  predictor  approximates  the  continuous 
time  predictor  in  mean-square,  simple  nonoptimal-coeff icients  discrete-time  pre 
dictors  can  be  obtained  from  the  form  of  the  cont inuous-t ime  predictor  whenever 
the  latter  is  known  explicitly  in  the  time  domain.  In  particular  when  the  pro¬ 
cess  X  has  a  rational  spectral  density  and  has  precisely  k  mean-square  deriva¬ 
tives,  as  we  assume  henceforth,  then  the  optimal  linear  continuous-time  predict 
or  has  the  form  [2] 

k  T 

X  (s'*  *  I  {a  X^^\-T) +b  X^^^T)}  +  /c(T)X(T)dT  (2.8 

j=0  ^  J  -T 

where  the  coefficients  {a.},  {b.}  and  the  filter  c(t)  all  depend  on  the  time 

J  J 

s  at  which  the  process  is  predicted;  and  when  we  wish  to  emphasize  this  depend¬ 
ence  we  will  write  a.(s),  b.(s),  c(t,s).  The  values  of  {a.},  {b.l  and  c(t)  are 

J  J  .11 

obtained  as  the  solutions  of  the  linear  integral  equation 


R(s  -  t)  =  /  h(T)R(t  -  t)dT,  |t|  <  T, 

-T 

h(t)  =  )'(-!)  Ua.  6^  ’  ^  (t  +T)  +  b.S^j^t  -  T)1  +  c(t)  . 
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The  form  (2.8)  for  the  optimal  continuous-time  predictor  suggests  generating 

nonoptimal-coef f icients  discrete-time  predictors  as  follows.  First  the  endpoints 

+T  should  be  included  in  the  sampling  design:  -T  =  t  <  t„<.  .  .‘^t  .<t  =T.  Then  to 

obtain  the  discrete-data  predictor  ^^(s)  from  the  continuous-data  predictor 

Xj(s),  in  the  expression  (2.8)  of  ^^^(s): 

(i)  replace  each  quadratic  mean  derivative  X^‘^^(+T),  j  =  l,...,  k,  by  its 

natural  approximation  by  means  of  the  samples,  e.g.  replace  X^^\t)  by 

[X(T)  -X(t  ,)]/(T-t  .),  and 

n-1  n-1 

X 

(11)  replace  the  Integral  /  ^c(T)X(T)dT  by  a  Riemann-type  approximation  in 
terras  of  its  samples  {c(t|^)X(t|^) 

Tlie  appropriate  quadrature  rule  for  the  approximation  of  the  integral  depends 
on  the  number  k  of  quadratic-mean  derivatives  of  the  process  and  will  be  speci¬ 
fied  in  the  sequel. 

Here,  fi  simplicity,  only  the  cases  k  =  0  and  k=l  will  be  discussed  in 
detail  as  thov  bring  out  all  the  salient  features  of  the  problem  at  hand. 


In  this  case  the  optimal  continuous  data  predictor  has  the  form 


ftj(s)  =  agX(-T)  +  b^X(T)  +  /  c(T)X(T)dT 


(2.11) 


where  the  constants  a,b  and  the  filter  c(t)  are  the  solution  of  (2.9)  with  k=0. 

A . 1 .  Optimal-Coefficients  Predictors 

When  the  optimal-coefficients  discrete-data  predictor  X^(s)  of  (2.2)  is  used,  the  excess 
mean-square  error  (2.7)  can  be  written  in  the  form 


^(s)  -  E^(S)  .  .  ||f^- 


where  f  (t),  |l|  <T,  is  the  function  in  the  reproducing  kernel  Hilbert  space 

2 

of  the  covariance  R,  restricted  to  {-T,T]  ,  which  corresponds  to  the  random 
variable  (s) : 


T 

f  (t)  =  ElL(s)X(t)]  =  a^R(t  +  T) +b_R(t -T)  +  /  c  ( r)  R(t  -  T)dT , 
si  U  U  _ 


and  P  denotes  the  projection  on  the  space  generated  by  * -t,  )  •  Thus 

iJ  K  K*~  X 

Ll\e  results  in  Sacks  and  Ylvisaker  [3  ,  Theorem  3.1  and  Remark  3.3]  are  applic¬ 
able.  .Spec  i  r  1  ra  1 1 V  if  {D  t  is  a  regular  sequence  of  sampling  designs 

n  n  =  2 

".enerated  hv  a  continuous,  positive  density  p(t)  on  [-T.T],  i.e.,  D  -  {t  ,  )?  , 

n  n  ,  k  K  -  I 

with  t  s.a t  is f  V i ng 


then  the  excess  error  (2.7)  of  the  discrete-time  predictor  (2.2)  satisfies 


(s)  -Cj(s)]  -  ^  (2.13) 

n  -T  p  (t) 

as  the  sample  size  n-*-'=o,  where  r  =  R'(O-)  -  R'(0+)>0.  It  is  seen  from  (2.13) 

2 

that  the  excess  error  decreases  precisely  at  the  rate  1/n  and  that  the  asymp¬ 
totic  constant  depends  on  the  correlation  function  R(t)  of  the  process  X  via 
the  jump  r  of  its  derivative  at  zero  as  well  as  via  the  filter  c(t,s)-cf.  the 
integral  equation  (2.9).  The  result  of  (2.13)  is  valid  for  any  continuous  and 
positive  density  p(t)  on  [-T,T].  In  particular  for  the  uniform  density 
pit)  =  Yj  ^[_x  sampling  instants  are  equally  spaced: 


t  ,  =  T[ 

n ,  K 


2(k-l) 

n-1 


1],  k=l,., 


and  (2.13)  provides  a  precise  result  on  the  magnitude  of  the  excess  error: 

2  T 

2,2.,  2 ,  , ,  rT  }  2, 

n  [t„(s)-  f  (s)l  -rr-  J  c  (t,s)dt. 

n  ‘  -T 

The  case  of  equal  1 v-sp. iced  sampling  instants  is  not  the  best  possible,  however. 

If  the  sampling  densitv  p(t)  is  chosen  so  as  to  minimize  the  asvmptotic  con¬ 
stant  in  (2.13),  i.e.  (by  Holder's  inequality), 

P,(t)  =  ,  ltl<  T,  CM’.) 

/_.jjc(u,s)r5du 

••e  shtain  from  (2.12)  a  sequence  f  D*  }  of  sampling  designs  which  is  a  svmpt  o  t  i  c  a  I  I  v 

n 

q  :  imil  in  the  sense  that  [  3  ] 
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It  is  seen  from  (2.14)  that  the  sampling  density  the  asymptotically 

optimal  sequence  of  designs  depends  explicitly  on  the  filter  c(t,s)  and  thus  on 
the  correlation  function  R(t)  of  the  process  X  via  (2.9).  This  is,  of  course, 
in  sharp  contrast  to  equally-spaced  sampling.  Furthermore,  in  general,  the 
optimal  sampling  density  Pg(l)  depends  on  the  time  s  at  which  prediction  is 
required.  As  will  be  seen  from  the  example  at  the  end  of  thi.s  section,  it  could 
happen  that  the  filter  function  c(t,s)  depends  on  the  prediction  time  s  in  a 
t.ic  torahle  form;  c(t,s)  =  Cj^(t)c2(s),  in  which  case  the  optimal  sampling  den- 
-'itv  p  (r)  of  (2.14)  no  longer  depends  on  the  prediction  time  s:  p^(t)  =  p(t)  - 
a  desirable  feature.  In  the  general  case  one  can  derive  a  sampling  density 
•  (,  I  1  ,  independent  of  s,  by  averaging  the  asvmptotic  constant  in  (2.13)  with  a 
■■.'fikOit  innciion  w(s')  reflecting  the  prediction  requirement  as  one  moves 
!  rov!  the  observation  interval  [-T,T): 


f  (i  e‘'(r  ,s)w(s)ds)— - , 

■-T  T  P^(t) 


tn.;  o  a- 'o.s  i  np  the  sampling  di’nsity  n(t)  wliich  minimi.tes  this  avertigod  con- 


I'a  I'"  ult  ine  s.impl  i  in;  density  p(t)  is  proportional  to 
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CD  J  . 

(/  c^(t ,s)w(s)ds)  ^ 


and  thus  does  not  depend  on  s,  with  corresponding  asymptotic  constant  for  the 
excess  prediction  error 


r  c  (t.s) _ 

(/”c^(t,u)w(u)du) 


which  does,  of  course,  depend  on  s. 

A . 2 .  Nonoptimal  Predictors  With  Median  Sampling 

In  this  subsection  we  consider  the  performance  of  predictors  X^(s)  whose 
coefficients  c^^  are  not  optimal.  As  indicated  earlier,  such  nonoptimal-coef f i- 
c  Lents  predictors  can  be  obtained  by  a  discrete-time  approximation  of  the  optimal  continu¬ 
ous-time  predictor  which  is  now  given  by  (2.11).  A  discrete-time  approximation 
to  (2.11)  is  of  the  form 


n-2  c(t  , ) 


\(">  =  V<-«  "  V">  * 


V  X(t  J 

,)  n,k 


(2.16) 


for  n>2,  where  the  coefficients  a^.b^  and  the  function  c(t)  are  those  of  the 

optimal  continuous-time  predictor.  Here  the  sampling  instants  are 

0  =  {-T,t  t  „,T]  where  the  sampling  points  {t  ,...,  t^ 

n  n,l  n,n-i  > 

the  medians  of  a  regular  sequence  of  designs  generated  by  a  positive  and  con- 

/  -.0-2  .  , 

rinuous  sampling  density  p(t)  on  [-T,!),  i.e.,  sati.sly 


^  ^  7 1  —  I 

/  p(t)dt  =  i=l, 

-T 


n-  2  ,  n  c  3. 


(2.17) 


Note  that  in  this  :ase  the  sampling  points  =  discrete  ap- 

prnxlraation  of  the  integral  J  c(T)X(T)d'r  .are  all  in  the  y,  •  ^  o)  of  [  1,1]. 


For  example  when  p(t)  =  —  have 


m  2  1  “  n^  If.  f»  *5 

t  ,  =  T  - - — ,  1  =  1,...,  n-2  n23. 

n,i  n-2 


The  mean-square  prediction  error  is  given  by 


Cp  (s)  =  E[X(s)-Xjj  (s)]^ 
n  n 


R(0)[1 +aQ  +  bg]  -  2aQR(s  +  T)  -  2bQR(s  -  T)  +  2apbQR(2T) 


n-2  c(t  ) 

~  y  ->-^>--S-[R(s  -  t  .)-a.R(T+t  )-bR(T-t  )] 
n-2  '•  p(t  ,)  n,k  0  n,K  0  n,K 

K= 1  n , K 

n-2  n-2  c(t  .)  c(t  ) 

— ^  y  y  R(t  .-t  .). 

(n-2)2  j  =  l  k=l  P(^n,j^  P<‘^n,k^ 


(2.18) 


The  excess  error  of  the  discrete-time  predictor  (2.16)  is  then  precisely  the  error 
in  the  integral  approximation,  i.e., 

0=  (s)  -  cj(s)  .  El/  c(t)X(Od,  -  i 

n  -T  k=l  n,k 

lor  n  •  3.  From  the  result  of  Schoenfclder  [  6  ]  (see  also  Cambanis  and  Masry  [  1  ]) 
we  have 


T  2 

2.2  ,,  ^cc  (t ,s) 

n  [e^  "  12  / 

n  -1  p  (t) 


(2.19) 


,  where  r  =  R' (d-)  -  R' (Of)  >0.  The  rectangular  rule  of  integral  approxi- 


mailon  was  used  in  (2.16)  because  among  all  Newton-Cotes  formulae  of  order  one, 
it  results  in  the  smallest  value  of  the  asymptotic  constant  (2.19)  as  was 


✓  ^ 
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shovm  by  Schoenf elder  [6]. 

It  is  seen  from  (2.19)  that  the  nonoptlmal-coefflcients  predictors  (2.16) 

have  the  same  asymptotic  performance  as  the  optimal-coefficients  predictors  (2.2). 

All  comments  made  earlier  about  asymptotically  optimal  sequences  of  designs  and 

optimal  sampling  densities  p(t)  are  applicable  here  verbatim.  For  a  small  sample 

size  n  one  expects  the  optimal-coefficients  discrete-time  predictor  (2.2)  to  have 

a  smaller  excess  error  than  that  of  the  nonoptlmal-coefflcients  predictor  (2.16). 

Concerning  the  ease  of  implementation,  the  optimal-coefficients  predictors  (2.2) 

require  the  Inversion  of  the  covariance  matrices  [R(t  ,  -t  .)]'^  .  ,  for  each 

n,k  n,j  n,j=l 

sample  size  n,  which  Is  not  especially  difficult  to  accomplish  on  a  computer. 

On  the  other  hand,  the  nonoptlmal-coefflcients  predictors  (2.16)  require  the  sol¬ 
ution  of  the  prediction  integral  equation  (2.9)  with  k=0  for  the  determination 
of  aQjbg  and  c(t)  in  (2.16),  which  may  be  a  more  complex  task,  but  is  done  once 
regardless  of  the  sample  size  n. 

A .  3  Example 

We  now  Illustrate  by  an  example  the  performance  of  the  optimal-coefficients 
and  nonoptlmal-coefflcients  predictors  in  conjunction  with  uniform  or  asymptot¬ 
ically  optimal  sampling  schemes.  We  focus  our  attention  on  the  performance  for 
finite  sample  size  n,  as  the  asymptotic  performance  is  clear  from  (2.13)  and 
(2.19).  In  particular  we  obtain  the  improvement  in  performance  that  the  asymp¬ 
totically  optimal  sampling  (with  density  p(t)  of  (2.14))  provides  over  the  com¬ 
monly  used  uniform  sampling. 

We  assume  the  process  X  has  spectral  den.sity 


3  2  2 

7  0  7  9 

r(u  +('“)  (3^+'''^) 


and  eorrelat  Lon  1  unction 


K  V 


R(t)  =  e  |t|] 

a  +B 


(2.20) 


where  a,B^0.  Note  that  R(0)  =  1  and  that  X  has  no  quadratic-mean  derivatives. 
When  01=  B,  X  is  first-order  Markov  so  that  the  optimal  continuous-time  predictor 
of  X(s) ,  s >  T,  from  the  observations  {X(t),  |t]  :£  T}  uses  only  the  data  point 
X(T) ,  namely 


and  the  question  of  sampling  designs  does  not  arise.  Hence  we  shall  assume 
t  ^  S  in  tlie  following. 

The  optimal  continuous-time  predictor  is  of  the  form  (2.11)  with  a^.b^, 
and  c(t)  being  the  solution  of  the  integral  equation  (2.9)  with  k=0.  They 
can  be  found  by  the  method  in  Rozanov  [2]  and,  after  lengthy  computations, 
we  have 


^0^^^  °"d /(T)'^  ‘  ~  ^ 

4 


-S(s-T) 


(2.21a) 


bg(s)  =  {1  +  B^(s-T)}e■‘^^®■^^ 

4 


(2.21b) 


1  2  -B(s-T1 

■(l.  ,)=-(r-B  )-~Yy-(s-T)e  =  d2(t)c2(s), 

4 


(2.21c) 


d„(t)  =  (.  +  c)%-(^+T)  _  ^^_p^m^-«(t+T) 


(2.21d) 


(2.21c) 


lA 


The  mean-square  prediction  error  Ej(s)  for  the  optimal  continuous-time  predictor, 
given  by  (2.4),  can  be  computed  explicitly  to  yield 


e^(s)  =  1  -  aQ(s)R(s  +  T)  -  b^(s)R(s-T)  -  C2(s)A(s) 


(2.22a) 


where 


— fiT  R 

A(s)  =  2e  {(a+6)e  [  (1  +  Bs +— g) sinh(a  +  6)T  -  BT  cosh(a  +  3)T] 


(a-6)e  (1  +  Bs  -  ^-^)sinh(«  -  B)T  +  BT  cosh  (a  -  B)  T]  )  , 


(2.22b) 


and 


B  =  & 


2  q2 

q  -B 

2  2  ■ 
q  +6 


(2.22c) 


Kor  the  discrete-time  predictor  (2.2)  with  optimal  coefficients  and  sample  size 

2 

n,  the  mean-square  error  (s)  is  given  by  (2.3),  whereas  for  the  nonoptimal-co- 

n 

efficients  discrete-time  predictor  (2.16)  with  sample  size  n  the  mean-square  error 

i'“  (s)  is  given  by  (2.18). 
n 

In  the  following  numerical  results  the  observation  interval  is  set  to 

[-1,1],  i.e.  T=  1,  and  the  parameter  B  of  the  correlation  function  R(t)  is 

set  to  6=1.  After  a  preliminary  investigation,  the  value  of  the  parameter 

I  in  the  correlation  function  R(t)  was  set  to  iX  =  15.  The  reasons  for  this 

choice  are  as  follows.  When  a=  1  we  have  a  first-order  Markov  process  X  for 

wiiich  the  cont inuous-t Ime  predictor  with  optimal  coefficient  uses  only  the 

endpoint  X(+  1)  so  that,  in  this  case,  the  question  of  sampling  designs  for 

prediction  is  not  Interesting.  Thus  we  are  interested  in  choosing  For 

()  •  t  •  1  numerical  computations  showed  that  the  mean-square  prediction  errors, 

2  2 

(s)  and  t  (s) ,  for  the  optimal  continuous-time  predictor  and  respectively 

IJ  , 

lor  the  discrete-time  predictor  with  optimal  coefficients  using  only  X(+l), 


are  essentially  identical  so  that  the  sampling  design  problem  is  ,)(;.iin  of  no 

interest.  For  a>l  the  numerical  results  showed  that  the  difference  between 
2  2 

e^Cs)  and  (s)  becomes  more  pronounced  as  a  Increases.  For  example  for 
1  D™ 

2  ? 

<1=5.2  we  have  G^(2)  =  .A4  and  (2)  =  .A8  so  that  the  fractional  error  for 

n=2  Is  already  too  small  to  exhibit  the  performance  of  the  design  for  dif- 

2 

ferent  sample  sizes  n.  For  a =  10  the  corresponding  numbers  are  £^(2)  =  .38 
2 

and  C  (2)  =  .A58.  It  is  thus  seen  that  we  need  to  choose  a  much  larger  than 
°2 

8=1  in  order  to  deviate  sufficiently  from  the  Gauss-Mar kov  case.  We  have 
chosen  a =  15  in  order  for  the  sampling  design  problem  to  be  of  Interest. 

A. 3.1.  Optimal-Coefficients  Predictors 

2 

Figure  1  compares  the  error  e^Cs)  of  the  cont inuous-t ime  predictor  with 
2 

the  error  Cp.(s)  of  the  discrete-time  predictor  with  optimal  coefficients  and 
equally  spaced  samples,  for  prediction  lags  s-Tc  [0,3],  It  is  seen  that 
for  a  sample  size  n=10,  the  two  mean-square  errors  are  very  close.  Note 
that  for  very  small  and  for  very  large  prediction  lags,  the  performance  of 
the  two  predictors  should  bo  very  close  even  when  the  sample  size  is  small 
(n=2)  as  expected  intuitively,  since  for  zero  lag,  the  prediction  error  in 
both  cases  is  zero,  and  as  the  lag  tends  to  infinity,  the  prediction  error 
approaches  R(0)  =  1. 

Figure  2  provides  a  similar  comparison  when  the  discrete-time  predictor 
with  optimal  coefficients  uses  the  asymptotically  optimal  sampling  instants 
(a  r<-gnlar  sequence  (2.12)  generated  by  the  sampling  densitv  p(t)  of  (2.1A)). 
While  in  general  p(t)  of  (2.1  ''  depends  on  the  prediction  time  s,  leading  to 
lifferent  s.inipling  instants  for  different  values  of  s  (cf.  (2.12)),  it  is 


^■''l'n  from  the  dependence  of  c(t)  on  s  in  (2.21c)  that  for  this  example  p(t) 


is  proportional  to 


P(t)-  +  |t|sT. 


which  is  independent  of  s,  so  that  the  sampling  Instants  {t  generated 

by  p(t)  via  (2.12)  are  the  same  for  all  prediction  times  s > T.  It  is  seen 
from  Figure  2  that  with  only  n=3  samples  the  performance  of  the  discrete¬ 
time  predic  tor  is  already  very  close  to  that  of  the  continuous-time  predictor. 
The  Improvement  that  the  asymptotically  optimal  sampling  provides  over  uniform 
sampling  is  Illustrated  in  Figure  3,  where  the  fractional  error 

%  (s) 

Y^(s)  =  -  -  1  (2.2 

Cj(s) 

is  plotted  as  a  function  of  the  sample  size  n,  for  both  uniform  and  asymp¬ 
totically  optimal  sampling.  The  dramatic  improvement  provided  by  asymptotic¬ 
ally  optimal  sampling  is  readily  apparent  for  moderate  values  of  the  prediction 
lag  s-T.  For  example  for  lag  s-T=l,  using  n=3  samples,  we  have 


Y3(1)  = 


.171  ,  for  uniform  sampling, 

.076  ,  for  asymptotically  optimal  sampling. 


in  improvement  factor  of  2.25,  and  for  n=  5  samples. 


Y^d) 


.087  ,  for  uniform  sampling, 

.0026  ,  for  asymptotically  optimal  sampling. 


.in  inprovement  factor  of  33.4. 


A.  3. 2.  Nonopt Imal-Coef  f Ictents  Predictors 

For  the  discrete-time  predictor  (2.16)  which  uses  nonoptimal  coefficients 

2 

and  whose  mean-square  prediction  error  e^  (s)  is  given  by  (2.18)  it  was  found 

n 

that  its  performance  with  uniform  sampling  is  exceptionally  bad.  Specifically 

even  for  n  =  10  samples,  the  error  exceeds  R(0)  =  1  for  all  lags  s-T£  (.1,3],  and 

equals  4.3  for  lag  s-T=l.  Matters  are  much  worse  for  n=2  samples  where  the 

error  equals  23.06  for  lag  s-T=l.  This  behavior  can  be  explained  as  follows: 

The  excess  error  for  the  nonopt  imal-coef  f  icients  pred  let  or  (2.16)  is  precisely  due  to  the 

approximation  of  the  Integral  /_.j,c(t)X(t)dt  by  the  sum  in  (2.16).  Now  for 

the  selected  values  of  cl  and  6,  c(t)  of  (2.21c)  has  most  of  its  mass  concentrated 

near  the  right  end  point  t  =  T=l.  When  uniform  sampling  points  are  used,  only 

for  very  large  values  of  n  there  will  be  enough  samples  near  the  right  end 

point  of  the  interval  [-T,T]  to  provide  a  reasonable  approximation  for  the 

integral.  This  problem  does  not  arise  with  predictors  using  uniform  sampling 

and  optimal  coefficients  since  in  this  case  the  coefficients  {c  ,  1  of  the 

n ,  k 

predictor  are  the  best  possible  and  each  c  depends  on  all  sampling  points 

n ,  k 

It^  1^}  -  Indeed  even  for  n=2  samples  the  performance  is  quite  good  as  seen 
f  rora  F igure  1 . 

When  asymptotically  optimal  sampling  is  used  (cf.  (2,23)  and  (2.17))  the 

pi-rformance  of  the  nonopt  imal-coeff  icients  predictor  is  (|ui  t  e  reasonabl  e  .  Tin's  is 

because  the  mass  of  p(t)  -  which  is  again  independent  of  the  pred  ict  ion  t  ime  s  -  is 

concentrat ed  near  the  right  end  point  of  the  interval  [-T,T],  just  like  c(t,s), 

and  thus  the  sampling  instants  {t  generated  by  p(t)  via  (2.17)  are  also 

n  y  K  K“ i 

1  lustercd.  around  the  end  point  T.  Figure  4  provides  a  comparison  with  the 
I'crfomance  of  the  cont  inuous-t  ime  predictor  as  a  function  of  prediction  lag 


»U 
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s-T.  Note  that  when  n=5  samples  are  used,  the  performance  of  the  discrete 
and  continuous  time  predictors  is  very  close.  The  fractional  error  is  plotted 
in  Figure  5  as  a  function  of  the  sample  size  n  for  various  values  of  the  pre¬ 
diction  lag  s-T. 

If  we  compare  the  performance  of  the  opt imal-coef f ic ient s  and  nonoptlmal- 
coefficients  predictors  when  both  use  the  (asymptotically  optimal)  sampling 
density  p(t)  of  (2.23),  we  find  that  the  optimal-coefficients  predictor  is 
superior.  For  example  for  lag  s-T  =1  we  have 


n 

2 

3 

5 

8 

'Optimal  predictor 

.25 

.016 

.0026 

.  00066 

Nonoptimal  predictor 

63.6 

.28 

.0163 

,0026 

Thus  the  opt imal -coef f ic ients  predictor  requires  considerably  fewer  samples  to 
iciiievo  a  prescribed  level  of  error. 


r’  ^ 


.  V- 


r 


r 


-■r 


19 


I 


I 


I 


B.  Exactly  One  Quadratic-Mean  Derivative  (k.=  l) 


In  this  case  the  optimal  continuous-data  predictor  has  the  form 


X 

$j(s)  =  a^XC-T)  +  b^Xd)  +  a^X'(-T)  +  bj^X'(T)  +  /  c(t)X(t)dt  (2.25) 


where  c(t)  are  the  solutions  of  the  integral  equation  (2.9)  with  k  =  1 , 

and  thus  depend  on  the  time  s  of  prediction,  which  is  suppressed. 

B .  1  Optimal-Coefficients  Predictors 

When  the  optlmal-coeff  icients  discrete-data  predictor  is  used,  the  excess  mean-square 

2 

error  can  be  written  in  the  form  Ilf  -  P,,f  ||  .  as  when  k  =  0,  but  now  the  func- 

s  D  s 

tlon  f  (t),  Itl  ^T,  is  of  the  form 
s 


f  (t)  =  a„R(t  +T)  +b„R(t  -  T)  -  a,R'  (t  +  T)  -  b,R’  (t  -  T)  +  /  c(T)R(t  -  T)dT 
s  0  0  1  1  ■'.j. 


and  the  results  of  Sacks  and  Ylvisaker  [  3  ]  are  no  longer  applicable  because 
of  the  presence  of  derivatives  of  the  covariance.  Thus  no  precise  rates  of  con¬ 
vergence  to  zero  of  the  excess  error  are  available,  and  the  subsequent  results  on  non- 
(Mitimal-coeff  Icients  in  sussection  B.2  provide  upper  bounds  for  the  optimal-coeffi¬ 
cients  predictor.  Some  coniectures  are  also  offered  in  subsection  B.3. 

b •  2  No n t  i m^T_l  -Coef  f  ic  lent  s  P r ed  ichors 

T 

Usiny,  the  trapezoidal  rule  in  approximating  the  integral  /_,j,c  (t )  X(t )  dt  =  /cX 
in  (2.25)  by 


n-l  ,  c (t  ) 
n ,  k 


■  -  1  ,)  +  x(t_ 


"  "  k=l  2  '^^'^n,k+P 


W('  obtain  a  .■simple  nonopt  imal -coef  f  ic  ionts  d  iscrete— data  predictor  of  the  form 


1  < 


•  . 


n  J 


•  .s 


r 

■■  ■ 

V. 
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X(t  „)-X(-T)  X(T)-X(t  ) 

h  <’>  ■  “o"'-”  *  "oXW*  +  “i— T"-Sf - *  V  T-t  "t" 

n  n,2  n,n-l 


(2.26) 


where  {D  }  Is  a  regular  sequence  of  designs  generated  bv  the  cont  i  iiuous  and  positive 
n 

sarapl  ing  density  p(t),  l.e.  the  sampl  ing  points  In  D  are  -T=t  ,‘'t  „<...<t  <t  =T 

'  °  tor  n,  ln,2  n,n-ln,n 

and  are  specified  by  (2.12).  Denoting  by  A  ^  n’^T  n  errors  in  the  approxi¬ 
mation  of  the  quadratic-mean  derivatives,  l.e.  A  =X'(-T)  -  [X(t  )  -X(-T)l/tt  „+ T) , 

A  »  X' (T)  -  fX(T)  -  X(t  ,)]/(T-t  ,),  we  can  express  as  follows  the  excess 

T,n  n,n-l  n,n-l 

mean  square  error  in  view  of  (2.6),  (2.25)  and  (2.26), 


(s)  -  rj(s)  =  ajE(A^  )  +  bjE(A^  ) 
D  I  1  -T,n  1  i,n 

n 


+  2a  b  E(A  A„  ) 

X  X 


(2.27) 


+  2E{(a,A  +  b,A_  )(/cX  -  I  )) 

I  - 1  ,  n  1  T ,  n  •'  n 


+  E(JcX  -  I  V . 


I'he  as'mptotlc  performance  of  each  term  is  derived  in  tlio  Apinndix,  assuming  p 

and  c/p  are  twice  continuously  differentiable,  and  this  loads  to 

e^  (s)  -  cj(s)  =  -  C,(s)ll  +  0(1)] 

D  I  n  i 

n 


+  -^Cjls)!!  +  o(l)] 
n 


■■ 

tk 


.i 


r  \ 


(2. 28) 


as  n*o°,  where  each  line  in  (2.28)  shows  the  asymptotic  performance  of  the 
corresponding  line  in  (2.27),  and  the  constants  C^(s)  are  specified  in  the  ap¬ 
pendix.  Thus  the  overall  slow  rate  n  ^  is  due  to  the  slow  convergence  of  the 


^  J 


quadratic-mean  derivative  approximation,  and  we  have  (cf.  (A. 5)) 


j  ,  aj(s)  bj(s) 

n[e^^(s)  -  ej(s)]  -  C^(s)  = 


(2.29) 


(3)  (3) 

where  D  is  the  lump  of  the  th  ird  derivat  ive  of  R  at  zero :  p  =  K  (0+)  -  k  (0-)  >0.  in  par- 
l  icular ,  the  slow  rate  n  ^  is  the  maximum  possible  when  uni  f  orm  sampling  is  employed  . 


It  is  clear  from  (2.27)  and  (2.28)  that  the  integral  approximation  has  a 

-4 

much  faster  rate  of  convergence  n  .  This  substantial  loss  of  rate  convergence 

can  be  averted  by  replacing  in  each  n  point  sampling  design  D  the  points  t  . 

n  n ,  2 

4  4 

and  t  by  (t  + T)  -  T  and  T  -  (T  -  t  ,)  respectively.  The  resulting 

n,n-l  n,2  n,n-l  t-  j  o 

modified  regular  sequence  of  designs  {D^}  (which  can  no  longer  be  uniform)  has 
excess  error  whose  term  by  term  asymptotic  performance  is  described  by 


ep,(s)  -  ej(s)  =  +  o(l)] 


+  ^^(s)[l  +  0(1)] 

n 


(2.30) 


+  -^’(s)[i  +  0(1): 


+  (s)[l  +  0(1)], 

4  *4 


fn  thi^i  case  the  rate  of  convergence  is  n  and,  as  n-*-™. 


n^[e‘  (s)  -  L'j(s)]  "  C'(s)  +  C,  (s) 
D  i  14 


(2.31) 


where  (cf.  (A, 8),  (A. 18))  „ 

.  a,(s)  b^(s) 

c;(s)  =  3o{-^ - +  4—}  , 

p^(-T)  P^(T) 


T  2 

/  X  1  /•  c  (t,s) 


dt  +  C(s) 


and  C(s)  is  given  in  (A.i9)  and  depends  on  s  only  through  c (+  T,s)  and  c'(+T,s), 
and  on  p  onlv  through  the  boundary  values  p(+  T),  p'(+T).  Because  of  the  depend 
ence  of  tiie  asymptotic  constant  in  (2.31)  on  the  values  of  p  and  p'  at  the  end¬ 
points  +T,  its  minimization  with  respect  to  p(t)  is  messy  and  perhaps  not  feasi¬ 
ble  in  view  of  the  continuity  requirements  on  p.  On  the  other  hand,  the  part  of 

tlie  asvTnptotic  constant  which  depends  on  p(t),  |  t  (  <  T,  i .  e.  j  c  p  ,  is  minimized 

I  1 2/5 

when  p(t)  is  proportional  to  |c(t,s)| 

A  further  small  improvement  can  be  achieved  by  bringing  the  modified 
sampling  points  even  slightly  closer  to  the  endpoints  so  as  to  cancel  asymp¬ 
totically  the  effect  of  the  approximation  of  the  quadratic-mean  derivatives. 

,  4+0 

For  instance  if  the  modified  sampling  points  are  cliosen  bv  t"  „=  (t  o  +  T)  -  T, 

n ,  z  n ,  z 

4+g 

t"  =  T  -  (T  -  t  ,)  ,  e > 0,  then  the  resulting  modified  regular  sequence  of 

n,n-l  n,n-l 

designs  {d"}  has,  of  course,  the  same  rate  but  smaller  asymptotic  constant: 
n 


4  2  ^ 

n  [e^Cs)  -  C^(s)l 

n 


C^(s) 


(2.32 


i  F.xanplo 


Here  the  process  X  has  the  spectral  density 


MO  = 


aO  2  .2 

8i'.  't  +  V 


■y  -7  0)1 

■(3o‘'+;'“)  ((r+A“) 


and  correlation  function 


R(t)  -  e-el'ltl  +  Bitl 

3a  +6 


where  a,6>  0.  Note  that  R(0)  =  1  and  that  X  has  precisely  one  quadratic-mean 
derivative.  When  a=B,  the  process  X  is  second-order  Markov  for  which  the  opti¬ 
mal  continuous-time  predictor  of  X(s),  s  >  T,  from  the  observation  {X(t),  1 1  | 
uses  only  the  data  points  X(T)  and  X'(T).  Since  the  derivative  X' (T)  is  not 
part  of  the  observation,  the  case  a = B  is  still  of  some  interest.  However,  we 
shall  consider  the  case  a^B  in  order  to  exhibit  the  performance  of  sampling 
designs  which  provide  data  points  inside  the  interval  [-T,T]. 

The  coefficients  a^,  b^,  b^^,  and  c(t)  in  the  optimal  continuous-time 

predictor  (2.25)  can  be  obtained  as  follows.  From  Rozanov  [  2  ]  or  Yaglom 

“CX  t  t 

I  7  ],  It  can  be  seen  that  c(t)  is  of  the  form  c(t)  =  B^e  +  B^e  and  substi¬ 
tuting  h(t)  of  (2.10)  with  k=l  in  the  integral  equation  (2.9),  carrying  out 
the  Integration,  and  equating  the  coefficients  of  t  ,  •£  =  0,1,2,  on  both 
sides  of  (2.9)  leads  to  a  system  of  six  equations  in  the  unknowns  aQ,bQ,aj^,hj^,Bj^4i 
(the  general  expressions  for  a^ib.  given  in  Rozanov  [2,p.  137  ]  are  incorrect ;  hence  the 

substitution  approach  taken  here)  .  After  lengthy  computations  we  find 
.  '  1  _  3aB  ..  2  ,,2  3  2  -B(s-T) 

o 


a^(s)  =  -aQ(.s)/(3B). 


A  fT) 

,  ^  \  T^4.r3  c  ^  1  2  .2,.  „.2^  -;'(s-T) 

b„(s)  =  '  1  -t-  .  (s-T)  -t-  ”  V-  “  1  rc 


where  d  and  6  are  defined  In  (2.21d-e). 
m  m 

2 

The  mean  square  prediction  error  e^Cs)  for  the  optimal  continuous-time  pre¬ 
dictor  is  g^ven  by  (2.4)  and  can  be  computed  explicitly  to  vield 


where 


0j.(s)  =  1  -  a^RCs +  T)  -  bQR(s  -  T) +aj^R' (s-t- T) -1  bj^R' (s  -  T)  -  c^(s)A(s) 


A(s)  =  2e‘®®{(a  +  B)^e'^'^[(l+^+ [T^-t-— — y]B  +  [  6  + s  +  Bs^)  sinh(B  +  a)T 


T(B  +  -STT  +  2Bs)cosh(B  +  a)T] 

p+Ct 


(a  -  P)^e  '^'^[(1+^+  [T^+— -JB  +  [6+^]s  +  Bs^)  sinh(B-u)T 

(B-a)  ^  ^ 


-T(B  +  ^  +  2Bs)cosh(6  -  a)T] } , 

p— ct 


2  „2, 
g  (g  -  B  ) 

2  2  ’ 
3a  -t-  B 


2  -Bt 

R'(t)  =  -  5-{(a^  +  3B^)  +  6(a^  -  B^)t},  t  >  0. 

3a  +  B 


i-\<r  the  di  screte-t  ime  predictor  (2.2)  with  optimal  coefficients  and  sample  size 

2 

n,  tiu’  mean-square  error  (s)  is  given  by  (2.3)  whereas  for  the  corresponding, 

n  2 

predictor  (2.26)  with  nonoptiraal  coefficients  the  mean-square  error  (s)  is 

n 

given  bv  (2.27). 

In  the  following  numerical  results  the  observation  interval  is  set  to 


(-],!],  i.e.,  T=  1,  and  tlie  parameter  B  of  the  correlation  function  R(t)  is 

2  2 

set  to  ,  =  1.  The  behavior  of  r  ^^(s)  and  (s)  (with  uniform  sampling)  as  a 

n 

function  of  a,  for  a  fixed  s,  was  investigated  numerlcallv:  results  sliow  that 
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2  2  2 

as  a  increases  both  ej(s)  and  (s)  decrease  with  (s)  decreasing  at  a 

n  "2 

slower  rate.  For  example  for  s  =  1.2  and  a=  .1,  1.2,  9.2  the  values  of  C^(s) 

2 

are  .64,  .276,  .074,  respectively,  whereas  the  values  of  (s)  are  .797,  .398, 
.23  respectively.  We  have  selected  a  moderate  value  of  ct=2.5. 

B . 3 . 1  Optimal-Coefficients  Predictors 

2 

Figure  6  compares  the  error  £^(3)  of  the  continuous-time  predictor  with  the 
2 

error  (s)  of  the  discrete-time  predictor  with  optimal  coefficients  and 

n  2 

equally-spaced  samples  for  prediction  lags  s-Tc  [0,3].  It  is  seen  that  (s) 

2  " 
approaches  £j(s),  as  n  Increases,  rather  slowly.  In  Figure  7  the  fractional 

error  (cf.  (2.24))  is  plotted  as  a  function  of  n  with  the  lag  s  -  T  as  parameter. 

It  is  clear  that  the  fractional  error  is  fairly  large  even  for  a  sample  size 

n  =  10. 


In  view  of  the  analysis  in  Sections  B.l  and  B.2  we  are  led  to  believe  that 

2  2 

the  rate  of  convergence  of  E^  (s)  -e^Cs)  Co  zero  is  perhaps  1/n  due  to  the 

n 

implicit  apnroxlmatlon  of  the  derivatives  X' (+T)  by  linear  combinations  of 
uniformly-spaced  samples  {X[ 2T(k  -  1) / (n  -  1)  -  T] .  It  should  be  noted  that 

rrf 

the  approximation  of  the  integral  /  ^c(t)X(t)dt  by  j^X[2T(k  -  1)  /  (n  -  1)  -  T] 

4 

has  a  rate  of  convergence  of  at  least  1/n  . 


■w;.* 


The  numerical  results  can  shed  light  on  the  rate  of  convergence.  Suppose 
that  the  asymptotic  result  is  of  the  form 


(s)  -  rj(.s)]  -  K(s) 
n 


(2.33) 


1-;  n  ■  ■  t  or  some  (unknown)  constants  k  and  K(s)  .  Then  as  n 


!?■ 


n\^(s)  -*■  k(s)e^(s)  =  Q(s) 
n  1 
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In  Figure  8  a  plot  of  n  Yj^(s)  is  given,  for  lag  s-T=1.5,  as  a  function  of  the 

sample  size  n  for  possible  values  of  k=l,2,3,4.  It  is  evident  that  when  k=l, 

2 

nY  (9)  approaches  a  constant  fairly  quickly,  supporting  our  belief  that  the  rate 
n 

of  convergence  in  (2.33)  in  indeed  1/n.  We  also  conducted  a  mean-square  fit 
for  k  and  Q(s)  by  minimizing 


J(k)  =  y^{nN^(s)  -  0(s)}^ 

n 

n=n^ 


witli  respect  to  k  and  Q(s)  .  For  the  range  of  sample  sizes  n=15,,..,  30 
(n^  =  15,  =  30)  we  found  that  the  best  fit  is 


k  =  1.035 


K(s)  = 


10.118 

for 

s-T  =  1.0 

2.35 

for 

s-T  =  1.5 

.8A8 

for 

s  -  T  =  2.0 

.379 

for 

s  -T  =  2.5 

•.V 


Interestingly,  the  best  fit  for  k  turned  out  to  be  independent  of  the  A  values 

of  s  listed  above.  This  result  clearly  supports  the  conjecture  that  the  rate 
2  2 

of  convergence  of  (s)  -  £^.(3)  is  1/n. 

n 

The  above  slow  rate  of  convergence  can  be  dramatically  improved  if  we  modi- 

fv  the  uniform  sampl ing  scheme  t  .-T(21-n-l)/(n-l),  i=l,...,  n,  by  appro- 

n,  I 

priately  shifting  t  -  and  t  _  towards  the  end  of  the  data  interval  [-T,Tl 
x\ j  ^ 

so  as  so  achieve  a  better  approximation  of  the  derivatives  X' (T)  and  x' (-T) . 
Suppose,  for  example,  we  let 


2T  A 

t  ,,  =  (  — ,-)  -  T  , 

n,2  M-1 


(_2T)^ 

n-r  ’ 


(2.3A) 


for  n>2,  so  that  the  approximation  of  the  derivative  X  (*T)  by  only  two  data 

points  X(t  .)  and  X(t  (and  similarly  for  X'(T)  by  X(t  )  and  X(t  -)) 

n^l  n^^  n^n  n^n^l 

4 

nchleves  a  rate  of  convergence  of  1/n  ,  We  then  expect  the  performance  of  the 

discrete-time  predictor  with  such  a  modified  uniform  sampling  to  improve 

dramatically  in  comparison  to  uniform  sampling.  This  turns  out  to  be  the 

2 

case;  In  Figure  9,  (s)  Is  plotted  as  a  function  of  the  lag  s-T  for  sample 

n 

sizes  n=2,5  when  this  modified  uniform  sampling  scheme  (2.34)  is  used.  It  Is 
seen  that  with  n=5,  the  performance  is  already  very  close  to  that  of  the  con¬ 
tinuous-time  predictor  (contrast  It  with  Figure  6  for  uniform  sampling).  This 

siiarp  improvement  for  the  modified  uniform  sampling  scheme  can  be  seen  more 

2 

clearly  in  Figure  10,  where  the  fractional  error  Y  (s)  is  displayed  as  a  func- 

n 

tlon  of  n  for  3  selected  values  of  s;  it  is  seen  that  the  modified  sampling 

scheme  with  n  =  7  outperforms  the  uniform  sampling  scheme  with  n  =  30  by  a 

factor  of  about  10.  For  example,  for  lag  s-T=1.5,  modified  uniform  sampling 

-3 

with  only  n=7  gives  a  fractional  error  of  5.12x10  ,  whereas  uniform  sampling 

with  n=7  gives  a  fractional  error  of  .109  and  even  when  n  =  30,  we  have  a  frac- 

-2 

tional  error  of  2.37x10 

4 

With  this  modified  uniform  sampling  we  expect  a  rate  of  convergence  of  1/n  , 

but  no  such  analytical  result  is  yet  available.  Due  to  numerical  instability  in 

the  inversion  of  the  covariance  matrix  when  modified  uniform  sampling  (2.34) 

n 

is  used  with  n>7,  we  were  unable  to  computationally  verify  this  conjectured  rate 
of  convergence. 


B.3.2.  Nonoptimal-Coeff Icients  Predictors 


When  the  nonoptimal-coefflclents  predictor  (2.26)  is  used  with  equally-spaced  samiiles 
2 

the  mean-square  error  e^  (s)  has  precisely  a  rate  of  convergence  equal  to  1/n, 

"2  2 

by  (2.29).  Figure  11  compares  e^  (s)  for  n=  2,5,10,  with  t:j(s)  of  the  optimal 

n 

continuous-time  predictor.  More  interestingly  Figure  12  provides  a  comparison 


ot  performance  betw(;en  the  discrete-time  predictor  with  optimal  cocf f i c  i  enl s 
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and  that  with  nonoptlmal  coefficients;  here  the  fractional  error  is  plotted 

for  each  case  as  a  function  of  the  sample  size  n  for  3  representative  values 

of  the  lag  s-T.  It  is  seen  that  the  performance  of  the  two  predictors  is 

fairly  close  in  this  case  with  equal  rate  of  convergence  (1/n). 

From  (2.31)  we  know  that  by  modifying  the  two  sampling  points  t  „  and 

n,  2 

4 

*'n  n-2  (2.3A)  we  obtain  a  precise  rate  of  convergence  1/n  for  the  discrete- 

time  predictor  with  nonoptimal  coefficients  (2.26).  The  performance  for  finite 

sample  size  n=2,...,  30,  is  displayed  in  Figures  13-15.  Figure  13  exhibits  the 
2  2 

mean-squ.ire  error  e^^  (s)  for  n=  2,5,10,  and  Cj(s)  of  the  optimal  continuous- 


n 

time  predictor.  This  should  be  compared  with  Figure  11  where  uniform  sampling 
is  employed.  Such  a  comparison  is  more  clearly  displayed  in  Figure  14  from 
which  the  dramatic  reduction  in  the  fractional  error  is  evident  (for  a  fixed 
lag)  under  the  modified  uniform  sampling  scheme. 

Finally  one  may  wish  to  compare  the  performance  of  the  two  discrete-time 
predictors,  with  optimal  coefficients  and  with  nonoptimal  coefficients,  (2.26), 
both  using  the  modified  uniform  sampling  scheme.  Such  a  comparison  is  given 
in  Figure  15  from  which  it  is  seen  Chat  for  small  sample  size  n^  7  the  pre¬ 
dictor  with  optimal  coefficients  significantly  outperforms  the  one  with  non¬ 
optimal  coefficients.  This  is  considerably  more  pronounced  here  under  a  modi¬ 
fied  uniform  sampling  than  in  Figure  12  under  uniform  sampling.  One  possible 
explanation  is  that  the  predictor  with  optimal  coefficients  does  a  much  better 
job  in  estimating  the  derivatives  X'(+T)  tlian  the  one  with  nom'pt imal  coeffi¬ 
cients,  implicitly  using  all  data  points  instead  of  just  two  points  as  in  (2.26). 


t: 

i; 

•/.s'] 
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APPENDIX 

Here  we  derive  the  asymptotics  of  the  terms  on  the  right  hand  side  of  (2,27) 
A.  Approximation  of  Quadratic-mean  Derivatives 


For  the  mean-square  error  in  the  approximation  of  the  quadratic-mean  deriva 
tives  (first  line  in  (2.27))  we  have 


E  {  X^^\u)  -i[X(u)  -X(u-h)]}^ 


-R^^^(O)-  |[R^^^(0)  -R^^^(h)]  +  4-[R(0)  -R(h)] 

h^ 


-R^^^O)  +  |[hR^^^(0)+  |h^R^^^(0+)  +  o(h^)] 
n  2  — 


-  ^[ih^R^^^O)  +  |h\^^\o+)  +  o(h^)] 
h 


|hR^^^(0+)  +  o(h) 


where  the  right  and  left  derivative  corresponds  to  h  positive  and  negative  re¬ 
spectively. 

For  the  cross  correlation  between  the  approximations  of  the  quadratic-mean 
derivatives  (second  line  in  (2.27))  we  have  with  w=u-v^O,  h>0,  g  >  0, 


E{X^^\u)  -■i(X(u)  -  X(u  -  h)]}{X^^\v)  --[X(v  +  g)  -X(v)]} 

n  g 


=  -R^^\w)  -i[R^^^„-g)  -R^^\w)]  -^[-R^^^w)  +R^^\w-h): 


+  r_[R(w  -  g)  -  R(w)  -  R(w  -  h  -  g)  +  R(w  -  h)  ] 


-rU)  _  if  ^  1^2^(3)  _  1^3^(4)  ^  J 

-  :^[-hR^^\h)  +|{i^R^^\w)  +  0(h^)J 


With  u 
signs,  it 


where  -T  < 


Hence  from 


+  (w)  {-g  +  (g+  h)  -  h)  +  R^^^  (w)-|-{g^  -  (g  +  h)^  +  h^} 

+  R^^\w)|{-g^  +  (g,+  h)^  -  h^}  +  R^^\w)~{g^-  (g+hf*  +  h*}  +  0((g  +  h(r)  ] 
=  -  |ghR^^\w)  +  0(h^  +  g^).  (A. 2) 


=  T,  V  =  -T,  h  =  T  -  t 
n 

follows  from  the  mean  value  theorem  that 


o  +  T,  and  a  regular  sequence  of  de- 
n  n,n-l  n  n,2  »  -i 


1  />!,  2 

-  =  /  p(t)dt  =  P(v  )(t  +T) 

n  n  n,z 


/  p(t)dt  =  p(u  )(T-  t  ) 
^  Ti  n^ri“'l 

n ,  n-1 


v<t  T,t  ,<u<T,  and  thus 
n  n,2  n,n-l  n 


1  ,  1 
^8ri  •T’^  » 


"n  p(-T)  ’  “"n  p(T)  ' 


(A.l)  and  (A. 2)  we  obtain 


uE(A%  ) 

-T,n 


2  R^^^(O-) 


3  p(-T)  3p(-T)  ’ 


,(3). 


nE(A^  )  -  2  R^^/_(0+)  _  P 

3  p(T)  -  3p(T)  ’ 


W. 


n^E(A  ,  _A,  )  .  - 


-T,n“T,n"  4p(-T)p(T)  ’ 


(A. 3) 


(A. 4) 


(A. 5) 


B.  Approximation  of  Integral 

The  integral  approximation  mean-square  error  can  be  written  in  the  form 


n  ‘^k+l  ‘^i+1 


T  n  k+i  1+i 

E[/  c(t)X(t)dt  -  I  ]  =  I  /  J  \  . (t,T)p(t)p(T)dtdT 

-T  ^  k,j=l  t  t 


k  i 


y  J 
k..i  =  l 


where 


.(t,T)  =  f (t)f (T)R(t  -  t)  --yf (t) [f (t .)R(t  -  t .)  +  f (t  )R(t  -  t  ) ] 
k.J  ^  .1  J  J+J-  J+i 


-  if  (T)  [f  (t^)R(T  -  +  f  (t^^^)R(T  - 


nnd  f(t)  =  c(t)/p(c),  and  where  n  is  dropped  from  ^  for  ease  of  notation. 
From  (2.12)  we  have  by  the  mean  value  theorem 


'k+l 

/  p(t)dt 
t. 


p(Uk)Atk 


wlure  t  ,•  u  ,•  t,  In  the  following  we  will  make  use  of  the  quantities 

k  k  k*  I 


k+l  . 

'o.k  ■  !  <  f<o- 


/  -  t.  -  jf  (t  )Atk}p(t)dL  , 


m=  1.2, 3, 4, 5. 


Their  asymptotics  are  found  by  Taylor  expanding  (fp)(t)  and 


p(t)  about  tj^.  Using  -  to  denote  equality  up  to  higher  order  terms  in 
we  find 


3 

F  ,  ~  G  At  for  m=0,l,2, 

m.k  IT  k 

F  -  G  At™'*'^  for  m=  3,4,5, 
m.k  m  k  >  »  » 


(A.l 


where 

(=0  -  f'p). 

^2  =  ■  i^P’ 


and  in  tliese  expressions  all  functions  are  evaluated  at  some,  possibly  distinct 
point  in  1 . 

For  the  diagonal  terms  ^  in  (A. 9)  we  use  the  Taylor  expansion  of  R 
about  zero, 

R(t)  =  R(0) 


wliere  r  is  in  between  0  and  T.  Since  R  does  not  have  a  third  derivative  at  0, 
(31  (31 

in  fact  R  (0+)  =  -R  (0-)  =  p/2 >  0,  we  need  to  keep  track  of  whether  the 
intermediate  point  f,  is  (lositive  or  negative.  After  considerable  algebra  we 
find  tliat 


k,k 


(A.h 


where  the  first  two  terms  are  of  the  order  At^^,  by  (A, 12),  and  the  third  term 
Involves  the  third  order  derivatives  of  R: 

*'k.+l 

-  //  dtdTp(t)p(T)i{R^^^f,pf(t)f(T)2(t-T)^-R^^\cpf(t)f(t^)(t-t|^)^ 

*^k 

-  R^^^f,3)f(t)f(t^_^3)(t|^^3- t)^  -  R^^^^^)f(i)f(t|^)(T  -  tj^)^ 

The  intermediate  points  to  are  between  0  and  t  -  t,  ,  t  -t,...,  At 

26  K  k+l  K 

respectively,  hence,  all  positive,  while  is  between  0  and  t  -  T.  Thus  the 

integral  of  the  first  term  should  be  done  separately  above  and  below  the  diagonal 

of  the  square  <  (tj^,tj^^3),  while  all  other  integrals  can  be  evaluated 

directly  on  the  entire  square.  Using  mean  value  theorem  to  pull  the  part  of 

(3) 

the  integrand  Involving  R  ,  f  and  p  out,  and  evaluating  the  resulting  integral 
we  f ind 

'^3,k  ^  R^^\pos)fpfp  ^  (neg)fpfp(-  ~) 

-  R^^\pos) f  pfp  ^  -  R^^\pos) fpf p -  R ^ ^\pos)  f pf p  ^ 

-  R  (pos)  f pf p  ^  +  R^^^ffpp  )  Atj^ 

where  f  and  p  are  evaluated  at  points  in  R^^^  at  points  tending 

to  0  with  n  whose  sign  is  Indicated.  Hence  R_  ,  is  the  dominant  term  in  (A.IA) 

j ,  k 

and  using  its  expression  above  along  with  (A. 11)  we  obtain 
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Vi  rR^^^((H)  R^^\o-)  R^^^(0+)  R^^^(0+) 

"  ^  k.k  "*  120  120  12  12 


T  2  2 

/  Li^> 

-T  p'(t) 


=  ■  R^^\o-)]  /  -^2^ 

-T  p  (t) 

=  ilo^I 


(A. 15) 


r"  1 


For  the  off-diagonal  terms  ■!  .,  k  j ,  In  (A. 9),  we  can  Taylor  expand  R 

»  J 

about  t  -t.^0  as  far  as  it  is  necessary,  as  It  is  infinitely  differentiable 
J 

away  from  0.  We  find  that  terms  involving  fifth  and  higher  order  derivatives 

of  R  are  of  higher  order  in  (At  ) (At . ) ,  and  after  considerable  algebra  we  have 

1 


1  121 

+  •;^R^  (t.  -  t  .)[F-  ,  F„  ,  +F-  .  F-  .  -  2F^  ,  .  ] 

2  ><■  j  0,k  2,j  2,k  0,j  0,k  0,j 


1  (31 

-  t.)[FT  ,  F-  .  -  F-  .  F-  .  +3F,  ,  F„  .  -  3F-  F,  J 
6  k  J  3,k  0,j  0,k  3,j  l,k  2,j  2,k  l,j 


1  (41 

+  -^R'  ^(t,  -  t.)[F-  ,  F,  .  +  F,  F-  .  -4F,  ,  F-  .  -  4F-  ,  F,  .  +  6F„  ,  F.  .] 
24  k  J  0,k4,j  4,k0,j  l,k3,j  3,kl,j  2,k2,j 


+  higher  order  terms. 


In  fact,  in  view  of  (A.  12),  the  terms  Involving  F^  and  F,  ^  are  also  of 

3  3 

higher  order,  and  all  remaining  terras  are  of  the  order  At,^At..  Using  (A. 11) 
and  (A.  12),  (A.  13),  we  find 


n^  I  .  >•  -W  //  I  ^  r  (t,T)R^‘"\t  -  T)  ]dtdi 

kit]  12  m=0 


(A. lb) 


where 


r.(t,T)  =  (— y(t)(— )'(t). 

0  p  p 


r  ft,T)  =  [  — (t)  +  (-)'(t)](— )’(T)  -  (~)'(t)[  — (T)  +  (-)'(T)], 
1  p  p  p  p  p  p 


r3(t,T) 


-(t)(— )’(T)  +  (— )'(t)-(T)  -  [^(t)  +  (-)'(t)][~(T)  +  (-)'([)] 
PP  PPPP  PP 


[  — (t)  +  (-)'(t)]-(T)  -  ''(t)l  -'-(T)  +  (I)'  (I)], 
P  PPPP  P 


Putting  toppther  (A. 15)  and  (A. 16)  we  have 


n‘F.(/cX-I  )2  If  ( 

p  12  tfs  m=0 


(A. 17) 


The  expression  of  the  asymptotic  constant  C^  =  C^(s)  can  be  simplified  considera¬ 
bly  by  integration  by  parts.  For  instance 


/  /  r  =  /  dt-(t)(/  +  /)dTR^^\t  -  r)-(T) 

tj^s  -T  P  -T  t  ^ 

T  t  T 

-  f  dJ(t){[-R^^\t  -  t)-‘-(t)]^  ^+[  "  f  +  (/  +  /)dTR^^\t  -T)(-)’(t)} 

J3.  P  P  T=-T  T=t  _7  t  P 


:  dt^CrV  -R^^’(n+)^(t)  -h  R^^^(t +T)^(-T)  -R^^^(t  -T)^(T)  (G  )-(t)] 

r  P  P  P  P  P 


f  /jR'^'ft  -  r)‘’-(t)(-)’(T)dtdT 


p  p 


f,  +  /'  (I  )  lR^^\t  4  T)'  (-T)  -  R^  (t  -  T)’  (T)  Idt 
''  P  P  P 

V 


'I'fR^^^t-  )  I  ‘  (t)  (i)  ’  (T)  -  (‘0 ’ (t)'  (I  >  Idtd  .  . 


f  /  ►  1  /  f  ^  ' 


The  first  term  Is  of  the  same  form  as  that  coming  from  the  diagonal  terms. 
Using  repeated  integration  by  parts  on  the  remaining  terms  we  obtain  finally 


where 


(A. 18) 


12^C  =  R(0)[(— )^(-T)  +  (— )^(T)]  -  2R(2T)— (-T)— (T) 

P  P  P  P 

+  R^^\-2T)[— (-T)-(T)  --(-T)~(T)]  (A. 19) 

P  P  P  P 

+  2R^^\2T)-(-T)-(T)  -R^(0)[(-)^(-T)  +  (-)^(T)] 

P  P  P  P 


and  f  =  c/p. 

It  should  be  noted  that  as  follows  from  the  work  of  Sacks  and  Ylvisaker 
[4,5]  the  first  term  on  the  right  had  side  of  (A. 18)  is  the  asymptotic  con¬ 
stant  of  the  regular  sequence  of  designs  using  optimal  coefficients.  Thus 

our  estimator  I  of  the  integral  is  not  asymptotically  optimal,  its  asymptotic 
n 

constant  exceeding  the  least  possible  value  by  the  amount  C  determined  by 


(A. 19). 


C.  Cross  Correlation  Between  Quadratic-mean  Derivative  Approximation  And 
Integral  Approximation 


Putting  =  X^^^(T)  -  [X(T)  -  X(T-h  )]/h  we  have 

T,n  n  n 

n  '"k+l  n 

-  i,  J 

fC“J-  t,  K  -L 

k 

where 

Mj^(t)  =  f(t){-R^^\t -T)  -:|^[R(t -T)  -  R(t -T  +  h^)]} 

n 

-  yf(tj^){-R^^\tj^-T)  -R(tj^-T  +  h^)]} 

n 

-  if f'k+i  - « -  ir'^'k+i  - « -  "''k+i  -  f  • 

n 

and  f  =  c/p.  As  in  A  of  Appendix,  h^  satisfies  {A. 7)  with  m =  1  or  4  or  >  4  . 
Since  for  each  k,  the  argument  of  R  and  R^^^  in  the  expression  of  never 
vanishes  In  the  interior  of  the  interval,  we  can  Taylor  expand  about  -  T 
and  regrouping  terms  we  have 


J.  =  R^^\t.  -T)4  F  +  R^^^(t  -T)[Vf  .  +|ii„F  ) 

k  k  2  n  0,k  k  on  u,k  2  n  l,k 


+  R^^^(t,  -T)[^^F„  ,  .  +7*  F„  .] 

k  24  n  0,k  6  n  l,k  4  n  2,k 


+  higher  order  terras. 


i.e.,  the  coefficients  of  R(t,  -  T) ,  R^^^(t,  -  T)  vanish  and  those  of  R^^\  etc. 

k  k 

are  of  higher  order  in  Then  using  (A. 7),  (A. 12)  and  (A. 13)  we  obtain 


n™El4T,„(/cX  -  VI 


u 


24p"'(T) 


where 


T 

=  / 


T)(^)'(t)dt  +  /  R 


-T 


(t  -  T)  [— (t)  +  (-) ' (t)  ]dt  +  /  R^^^  (t  -  T)-(t)dt 
P  P  _T  P 


and  can  be  simplified  by  Integration  by  parts  to 


D  =  R^^^(0)(— )(T)  -  R^^^(-2T)(— )(-T)  +  R^^\o-)-(T)  -  R^^^  (-2T)-(-T)  . 

p  P  P  P 


When  m= 1  we  obtain  the  third  line  In  (2.28),  when  m  =  4  we  get  the  third  line 
in  (2.30),  and  m>4  helps  lead  to  (2.32). 
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Figure  1.  Case  k  =  0.  Optimal  coefficients,  uniform  sampling. 


Figure  2.  Case  k  =  0.  Optimal  coefficients,  asymptot ica I  I v  optimal  samplint 
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Figure  4.  Case  k  =  0. 

Nonoptlmal  coefficients,  asymptotically  optimal  median  sampling. 
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Figure  5.  Case  k  =  0. 

Nonoptlmal  coefficients,  asymptotically  optimal  moclian  sampling. 
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Case  k  =  1.  Optimal  coef f ic ients ,  modified  uniform  sampling 
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Figure  10.  Case  k  =  1.  Optimal  coef  f  i c  ion t .s . 
Performance  under  uniform  and  modified  uniform  sampling. 
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Figure  12,  Case  V,  =  1.  Uniform  s.im’iling. 

Performance  of  predictors  with  optimal  and  nonoptinal  coefficients. 
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